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1.  Introduction 

A  conuDon  and  very  old  problem  in  statistics  is  the  estimation  of  an  unknown 
probability  density  function.  In  particular,  the  problem  of  nonparametric 
probability  density  estimation  has  been  studied  for  many  years.  Summaries  of 
results  on  nonparametric  density  estimation  based  on  complete  (uncensored) 
random  samples  have  been  listed  recently  by  several  authors,  including  Fryer  [18], 
Tapia  and  Thompson  [52],  Wertz  and  Schneider  [60],  and  Bean  and  Tsokos  [2). 

Also,  a  review  of  results  for  censored  samples  has  been  given  by  Padgett  and 
McNichols  [39].  In  addition  to  its  importance  in  theoretical  statistics,  non¬ 
parametric  density  estimation  has  been  utilized  in  hazard  analysis,  life  testing, 
and  reliability,  as  well  as  in  the  areas  of  nonparametric  discrimination  and 
high  energy  physics  [20]. 

^The  purpose  of  this  article  is  to  present  the  different  types  of  nonparajHctric 

density  estimates  that  have  been  proposed  for  the  situation  that  the  sample  data 

are  censored  or  incomplete.  This  type  of  data  arises  in  many  life  testing 

situations  and  is  common  in  survival  analysis  problems,  (see  Lagakos  [25]  and  . 

- ^  ty  ^  /(f.  ?3 

Kalbfleisch  and  Prentice  [21],  for  example).  In  many  of  these  situations,  some 
observations  may  be  censored  or  truncated  from  the  right,  referred  to  as  right - 
censorship.  This  occurs  often  in  medical  trials  when  the  patients  may  enter 
treatment  at  different  times  and  then  either  die  from  the  disease  under  invests- 
'gatlon  or  leave  the  study  before  its  conclusion.  A  similar  situation  may  occur 
in  industrial  life  testing  when  items  are  removed  from  the  test  at  random  times 
for  various  reasons.  It  is  of  Interest  to  be  able  to  cstiaiate  nonparametrlcally 
the  unknown  density  of  the  lifetime  random  variable  from  this  type  of  data  with¬ 
out  Ignoring  or  discarding  the  right-censored  information.  The  development  of 


such  nonparaaetric  density  estimators  has  only  occurred  in  the  past  six  or 
seven  years  and  the  avenues  of  investigation  have  been  similar  to  those  for 
the  complete  sample  case,  except  that  the  problems  are  generally  more  difficult 
mathematically. 

The  various  types  of  estimators  from  right-censored  samples  that  have 
been  proposed  in  the  literature  will  be  indicated  and  briefly  discussed  here. 
They  Include  histogram-type  estimators,  kernel-type  estimators,  maximum  like¬ 
lihood  estimators,  Fourier  series  estimators,  and  Bayesian  estimators.  In 
addition,  since  the  hazard  rate  function  estimation  problem  is  closely  related 
to  the  density  estimation  problem,  various  types  of  nonparametrlc  hazard  rate 
estimators  from  right-censored  data  will  be  briefly  mentioned.  Due  to  their 
computational  simplicity  and  other  properties,  the  kernel-type  density  estima¬ 
tors  will  be  emphasized,  and  some  examples  will  be  given  in  Section  7. 

Before  beginning  the  discussion  of  the  various  estimators,  in  the  next 
section  the  required  definitions  and  notation  will  be  presented. 

2.  Notation  and  Preliminaries 

Let  X^,]^,...,X°  denote  the  true  survival  times  of  n  items  or  indi¬ 
viduals  which  are  censored  on  the  right  by  a  sequence  U^,D2,...,I)^  which  in 
general  may  be  either  constants  or  random  variables.  It  is  assumed  that  the 
X^'s  are  nonnegative  Independent  identically  distributed  random  variables 
with  common  unknown  distribution  function  F°.  For  the  problem  of  density 
estimation,  it  is  assumed  that  F**  is  absolutely  continuous  with  density  f*’. 
The  corresponding  hazard  rate  function  is  defined  by  r**  •  f®/(l-F°). 

The  observed  right-censored  data  are  denoted  by  the  pairs  (X^,A^), 
i-l,...,n,  where 


■  inln{x' 


If  X’  S 


0  if  X^  > 


Thus,  it  is  knovm  which  observations  are  times  of  failure  or  death  and  which 

ones  are  censored  or  loss  tines.  The  nature  of  the  censoring  mechanism  depends 

on  the  U. *s:  (i)  If  U, are  fixed  constants,  the  observations  are 

i  in 

time-truncated,  if  all  equal  to  the  same  constant,  then  the  case 

of  Type  I  censoring  results,  (ii)  If  all  ■  *(r)  * 

statistic  of  X° . X°,  then  the  situation  is  that  of  Type  II  censoring. 

1  n 

(iii)  If  U, ,...,U  constitute  a  random  sample  from  a  distribution  H  (which 
1  n 

is  usually  unknown)  and  are  independent  of  X^,...,X^,  then  (X^,A^), 

1*1,2 . .  is  called  a  randomly  right-censored  sample. 

The  random  censorship  model  (iii)  is  attractive  because  of  its  mathematical 
convenience.  Many  of  the  estimators  discussed  later  are  based  on  this  model. 
Assuming  (iii),  are  independent  Bernoulli  random  variables  and  the 

distribution  function  F  of  each  X^,  i«l,...,n,  is  given  by  1-F  ■  (1-F°)(1-H) 
Under  the  Koziol  and  Green  [24]  model  of  random  censorship,  which  is  the  propor¬ 
tional  hazards  assumption  of  Cox  [7],  it  is  assumed  that  there  is  a  positive 
constant  B  such  that  1-H  ■  (1-F  Then  by  a  result  of  Chen,  Hollander,  and 

Langberg  [6],  the  pairs  (X°,Uj),  i-1 . .  follow  the  proportional  hazards 

model  if  and  only  if  (X, ,...,X„)  and  (A.,..., A  )  are  Independent.  This 

in  X  n 

Koziol-Green  model  of  random  censorship  arises  in  several  situations  (Efron  [11], 
Csorgo'^and  Horvath  [8],  Chen,  Hollander  and  Langberg  [6]).  Note  that  B  is  a 
censoring  coefficient  since  a  -  P(X°  s  U^)  -  (l  +  B)**^,  which  is  the  probability 
of  an  uncensored  observation. 

Based  on  the  censored  sample  (X^,A^),  i*l,...,n,  a  popular  estimator  of 
the  survival  probability  S°(t)  -  1-F°(t)  at  t  »  0  is  the  product-limit 


cstiBator»  proposed  by  Kaplan  and  Meier  [22]  as  the  nonparanetrlc  maximum 
likelihood  estimator"  of  S^.  This  estimator  was  shown  to  be  "self-consistent" 
by  Efron  [11].  Let  i>l,...,n,  denote  the  ordered  along  with 


their  corresponding  ^  value  of  the  censored  sample  td.ll  be  denoted  by 


the  corresponding  lover  case  letters  unordcred 

or  ordered  sample,  respectively.  The  product-limit  estimator  of  S°  is  defined 
by  [11] 

1,  0  s  t  s  z, 


Pn(t) 


^n-i+1^  ^  ^  ^  ^^k-l»V»  . “* 


0, 


t  >  Z  . 
n 


Denote  the  product-limit  estimator  of  F  (t)  by  F^(t) 


1-P  (t),  and  let 
o 


Sj  denote  the  jump  of  P^  (or  F^)  at  Zy  that  Is, 


1-P„(22). 


J-1 


•j  • 


^’n^^j^  -  . 


(2.1) 


n  n 


j*n. 


Note  that  s^  •  0  if  and  only  if  Aj  *0,  j  <  n,  that  is,  if  is  a  censored 

observation. 

The  product-limit  estimator  has  played  a  central  role  in  the  analysis  of 
censored  survival  data  (Miller  [36]),  and  its  properties  have  been  studied 
extensively  by  many  authors,  for  example,  Breslow  and  Crowley  [4],  Foldes, 

Rejto  and  Winter  [15],  and  Wellner  [59].  Many  of  the  oonparaaetric  density 
estimators  from  right-censored  data  are  naturally  based  on  the  product-limit 
estimator,  beginning  with  the  histogram-type  and  kernel-type  estimators. 
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One  of  the  simplest  nonparanetric  estimators  of  the  density  function 
for  randomly  right-censored  samples  Is  the  histogram  estimator.  Although 
they  are  simple  to  compute,  histogram  estimators  are  not  smooth  and  are 
generally  not  suited  to  sophisticated  Inference  procedures. 

Estimation  of  the  density  function  and  hazard  rate  of  survival  time 
based  on  randomly  right-censored  data  was  apparently  first  studied  by  Gehan 


[19].  The  life  table  estimate  of  the  survival  function  was  used  to  estimate 
the  density  f*^  as  follows:  The  observations  1-1,..., n,  were 

grouped  into  k  fixed  Intervals  ttj,t2),  U^.t^) , . . .  Itj^,“>) ,  with  the  finite 
widths  denoted  by  h^  -  l“l»»**»k-l.  Letting  n^  denote  the  number 

of  individuals  alive  at  time  t^,  be  the  number  of  individuals  censored 

(lost  or  withdrawn  from  the  study)  In  the  interval  ‘*1 

number  of  individuals  dying  or  failing  in  the  ith  interval  (where  time  to  death 


or  failure  is  recorded  from  time  of  entry  into  the  study),  define  -  d^/n^ 
and  p^  -  1  -  q^,  where  n^  -  n^  -  L^/2.  Therefore,  q^^  is  an  estimate  of 
the  probability  of  dying  or  failing  in  the  irt  interval,  given  exposure  to  risk 

AAA  A 

in  the  i^  interval.  Let  11^  -  p^_j^  ''**®*’*  Gehan's  estimate  of 


at  the  midpoint  t  .  of  the  ith  interval  is  then 


C.  »  ^i"^i+l  _  ^i^i 

^^‘'mi^  “  ~Tl  "“hT*  * 


An  expression  for  estimating  the  large  sample  approximation  to  the  variance  of 

A 

f(tmi)  was  also  given  in  [19]. 

^  o  •• 

Using  the  product-limit  estimator  F  of  F  ,  Foldes,  Rejto,  and  Winter  [16] 
—  n 

defined  a  histogram  estimator  of  f*’  on  a  specified  interval  [0,T],  T  >  0. 


T  be  a  partition  of 


For  Integer  n  >  0,  let  0  ■  t^  ■'  <  t^”"'  <...<  t'°'« 

/_\  n 

10, T]  Into  n  sublntcrvals  where 


r^Cn)  (n). 
^1-1*  1 


n 

Then  their  histogram  estimator  is 


1  S  1  <  V 
n 

1  ■  V  . 
n 


f(x)  - 


(n)  (n) 

1  “  *^1-1 


X  e 


(3.1) 


If  X  i  [O.T],  f(x)  Is  either  undefined  or  defined  arbitrarily.  Notice  that 
If  none  of  the  observations  are  censored,  reduces  to  the  empirical  distri¬ 

bution  function,  and  (3.1)  becomes  the  usual  histogram  estimator  with  respect 

A 

to  the  given  partition.  The  strong  uniform  consistency  of  f  on  10, T)  was 
proven  by  Foldes,  Rejto,  and  Vinter  [16]  under  snae  conditions  on  the  partition, 
provided  that  f°  was  continuous  on  I0,TJ  and  H(T'*)  <  1,  where  H(T~) 
denotes  the  limit  from  the  left  of  H  at  T.  This  last  condition  is  common  in 
obtaining  consistency  properties  under  random  right-censorship  and  insures  that 
uncensored  observations  can  be  obtained  from  the  entire  interval  of  interest. 

Burke  and  Horvath  [5]  defined  general  density  estimators  which  included 
histogram-type  and  kernel-type  estimators  with  appropriate  choices  of  the  de¬ 
fining  functions.  They  also  obtained  asymptotic  distribution  results  for  these 
estimators.  In  fact,  their  results  were  obtained  for  the  more  general 
situation  of  the  k  independent  competing  risks  model.  When  k>2,  this  reduces 
to  the  random  right-censorship  model. 


The  histogram  estimator  can  be  obtained  as  a  special  case  of  the  kernel 
density  estimators.  The  kernel-type  estimators  have  been  perhaps  the  most 
popular  estimators  In  practice  due  to  their  relative  computational  simplicity, 
smoothness,  and  other  properties.  Kernel-type  estimators  from  randomly  right- 
censored  data  have  been  studied  only  since  around  1978,  beginning  with  the  work 
of  Blum  and  Susarla  [3].  The  investigation  of  kernel  estimators  for  right- 
censored  samples  has  been  attempted  along  the  same  lines  as  for  the  complete 
sample  case.  However,  due  to  mathematical  difficulties  introduced  by  the  cen¬ 
soring,  some  of  the  analogous  theory  to  the  complete  sample  case  has  not  yet 
been  obtained. 

Blum  and  Susarla  [3]  generalized  the  complete  sample  results  of  Rosenblatt 

[45]  concerning  maximum  deviation  of  density  estimates  by  the  kernel  method. 

To  define  the  Blum-Susarla  density  estimator,  let  {h^}  be  a  positive  sequence, 

called  the  bandwidth  sequence,  such  that  lim  h  *  0,  and  let  N*** (x)  denote 

n-*®  ” 

the  number  of  observed  that  are  greater  than  x.  Define 


.  n  1+N^(X.)  [A-O,  X.  sx] 

H  (X)  -  n  { - ^  J 

J-1  2+N^(X^) 

where  [A]  denotes  the  Indicator  function  of  the  event  A.  By  a  iDodiflcation 

of  the  product-limit  estimator,  it  can  be  shown  that  B*  Is  a  good  estimate 
* 

of  H  «  1-H.  For  a  kernel  function  K  satisfying  certain  conditions,  the 
Blum-Susarla  density  estimator  is  given  by 


f*(x)  -  [nh  H*(x)]  ^ 
n  n  n 


■1  n 


(3.2) 


For  example,  K  can  be  a  bounded  density  function  with  support  in  the  Interval 
[-A,A]  for  some  A  >  0  and  absolutely  continuous  on  [-A,A]  with  derivative 


K'  which  Is  square  Integrable  on  [-A.Al.  By  following  standard  argument; 

(f  H  )  (x)  =  (n  h  )  I  K((x-X.  )/h  )  [A.  ■!]  and  H  (x)  can  be  shown  to  be 

“  n  J  »*  J  n 

O  '**  * 

good  estimators  of  f  (x)H  (x)  and  H  (x),  respectively. 

This  motivates  the  use  of  (3.1)  as  an  estimator  of  f°(x). 

Blum  and  Susarla  also  obtain  limit  theorems  for  the  maximum  over  a 
finite  interval  of  a  normalized  deviation  of  the  density  estimator  (3.2). 

These  results  are  useful  for  goodness-of-flt  tests  and  tests  of  hypothesat.  ahout 
the  unknown  lifetime  density  f°. 

It  was  conjectured  by  Blum  and  Susarla  [3]  that  the  kernel-type  estimator 

f^(x)  -  h"^  r  K((x-t)/h  )d  F*(t) 
u  n  D  n 

*  *  o 

behaved  in  the  same  way  as  f  ,  where  F  wa‘»  an  estimator  of  F  such  as  the 

n  n 

product-limit  estimator.  In  fact,  Foldes,  Rejto,  and  Winter  [16]  proved 

^  O  ^ 

uniform  almost  sure  convergence  of  f  to  f  when  F  was  taken  to  be  F  . 

n  n  n 

A 

Specifically,  one  of  their  results  was  that  sup  |f  (x)  -  f”(x)|  -*•  0  almost 

^  a<x<b  ** 

surely  as  n  provided  f  was  bounded  and  had  a  bounded  derivative  on 

(a,b),  -  ««>Sa<bs«»,  K  was  right-continuous  and  of  bounded  variation, 
h^(n/log  n)^^®  -»■  «,  and  H(Tj,q)  <  1,  where  T^^  •  sup{x;  F°(x)  <  l}.  Again, 
the  last  condition  insured  that  observed  lifetimes  in  the  entire  support  of 
would  be  available.  It  should  be  noted  that  if  no  censoring  is  present,  then 

f^(x)  -  h”^  K((x-t)/h^)d  F^(t)  (3.3) 

reduces  to  the  Parzen  [43]  estimator. 

McNichols  and  Padgett  [32]  wrote  (3.3)  in  the  form 


V 

V 


^  m 

I 

?• 


I 


K. 


10 


where  Sj  Is  given  by  (2.1).  They  considered  the  mean,  variance,  and  moan 
squared  error  of  (3.4)  under  the  Kozlol-Green  nodel  of  random  censorship  des- 

A 

crlbed  in  Section  2.  This  model  allowed  the  expected  value  of  f  (x)  to  be 

n 

evaluated  by  using  the  independence  of  (X, ,...,X  )  and  (A,,..., A  ).  In 

In  in 

particular,  if  K  is  a  Borel  function  such  that  sup(K(t)i  < 

|K(t)|dt  <  •.  lim  |tK(t)|-  0.  and  K(t)dt  -  1.  then 

f*ec 
-1 


E(f^(x)]  •  g^(t)f(t)K((x-t)/h^)dt 


+  (l-a)p  (a)h"^  E[K((x-2^/h^)l, 
n  n  n  n 


(3.5) 


.-1 


n-1 

where  a  ■  (1+6)  ■^,  b  ■  1-a,  p  (a)  ■  11  I (n-i+b) / (n-i+1) ] , 

”  i-1 


-  I  Il-F(t)l”‘^  {F(t)l^“^  . 

j-1 


C^)  «  (n+b)(n+b-l)  ...  (n+b-k+l)/k!  , 

F  ■  1  -  (1-H)(1-F®),  and  f  is  a  density  for  F.  Furthermore,  it  was  shown 

A 

that  if  h  -►0,  then  lin  E(f  (x))  ■  f°(x),  x  >  0.  Thus,  under  the  Kozlol- 
n  n 

n-^ 

Green  model,  ^^Cx)  is  asymptotically  unbiased  for  z  (x)  similar  to  the 

complete  sample  case  (the  conditions  on  R  and  h  are  those  imposed  by  Parzen 

n 


[43]).  Second  aioment  convergence  was  also  obtained  under  the  conditions  that 

n  h  “  and  b  ■  P(a  censored  observation)  <1  in  addition  to  the  conditions 
n 

required  for  as3nnptotic  unbiasedness  above  [32]. 

For  the  kernel  estimator  (3.4),  it  is  desirable  to  allow  the  data  to  play 

a  role  in  how  much  smoothing  is  done.  Since,  for  a  fixed  n,  h  is  the 

n 

"smoothing  constant,"  it  would  be  reasonable  to  allow  h  to  be  a  function  of 

n 

the  right-censored  sample.  McNichols  and  Padgett  [35]  consider  this  type  of 


iL^gi  v_*a  m.  -J.JILU  * ^  I ^  ^ 


i  ■' . '  ■ 
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nodification,  which  extends  the  work  of  Wagner  [54]  to  censored  data.  This 
modified  kernel  estimator  is 


(X)  -  I  8,  K((x-2  )/r  1, 
n  “  jtl  J  J  “ 


(3.6) 


where  F  =  F  (X.  ,...,X  )  Is  some  function  of  the  censored  data.  For  this 
n  n  1  n 


estimator  it  was  shown  that  if  H(T~o)  <  1*  K  has  bounded  variation, 
lim  xK(x)  *  0,  F  0  in  probability  (almost  surely),  and 


n^(log  log  n)"^  ^  probability  (almost  surely),  then  ^^^(x)  F  (x) 

in  probability  (almost  surely)  at  each  x  for  which  f**  is  continuous. 

One  choice  of  F  satisfying  the  above  conditions  is  as  follows; 
n 

If  Y  “  lo^]»  %  <  a  <  1.  where  [•]  denotes  the  greatest  integer  function. 


let  D.  be  the  distance  from  Z.  to  its  y  -nearest  neighbor  among 
jn  j  ® 


H . l£j£n,  ads  c  F^ 


with 


probability  s 


y 


The  practical  choice  of  the  bandwidth  h^  for  a  given  censored  sample  is 


a  problem  which  must  be  addressed  in  order  to  calculate  the  kernel  estimator. 

For  complete  samples,  several  "data-based"  procedures  for  selecting  a  "good" 

value  of  h  for  a  given  set  of  data  have  been  proposed  (see  Scott  and  Factor 
n 

[46],  for  example).  Among  these  procedures  when  samples  are  right-censored, 
the  maximum  likelihood  approach  seems  to  be  feasible.  This  will  be  discussed 
further  in  Section  6. 

A 

With  the  exception  of  the  expressions  for  the  mean,  E[f  (x)],  in  (3.5) 


and  for  E[t  (x)]  under  the  Koziol-Green  model  [32],  very  little  has  been 
n 


done  concerning  the  small-sample  properties  of  f  or  any  of  the  other  kernel- 


type  density  estimators  in  the  censored  data  case.  Fadgett  and  McNichols  [40] 


■-‘“'j. -ji  -j.  j.  •  -j. 


'  "■  ■ 


have  perforaed  Monte  Carlo  alnulatlons  for  several  parametric  families  of 


lifetime  distributions,  uniform  and  exponential  censoring  distributions, 
several  kernel  functions,  and  several  bandwldths  to  determine  the  small-sample 
behavior  of  f^  with  respect  to  bias  and  mean  squared  error. 

For  estimating  the  hazard  rate  function  from  randomly  right-censored 

data,  Foldes,  Rejto,  and  Winter  [16]  considered  estimators  of  the  form 


r„(x) 


f(x) 


1-F„(x) 
n  n 


X  i  0, 


A 

where  f  denoted  either  their  histogram  estimator  (3.1)  or  their  kernel-type 
estimator  (3.3).  The  1/n  in  the  denominator  simply  prevents  dividing  by  zero. 
Strong  consistency  results  for  r^  similar  to  those  for  (3.1)  and  (3.3)  were 
proven. 

McNichols  and  Padgett  [34]  considered  the  kernel-type  estimator  of  r** 
given  by 


r  (X)  -  h“^/K((x-t)/h  )[1-F„(t)l"^ 
u  n  n  M  n 


X  2  0  such  that  F(x)  <  1, 


under  the  Kozlol-Green  model  of  random  censorship.  Expressions  for  E[r  (x)] 

n 

and  var[r^(x)]  were  obtained,  and  it  was  shown  that  t^(x)  was  asymptotically 
unbiased,  and  converged  in  mean  square  and  in  probability  to  r^(x),  extending 
Watson  and  Leadbetter’s  [55,56]  results. 

Tanner  and  Wong  [50]  also  studied  a  kernel-type  estimator  of  r**  based 
on  the  ordered  censored  sample  (Z. ,A!),  n,  given  by 


r(x)  -  I  (n-J+1)"^  A*  K.  (x-2  ), 
j-1  J  n  •’ 


X  2  0  such  that  F(x)  <1, 


where  K  was  a  sysraetrlc  integrable  kernel  with  K},(y)  *  K(y/h).  They  derived 

A  ^ 

expressions  for  E[r(x)]  and  var[r(x)]  and  proved  wder  the  conditions  on  K 

stated  by  Watson  and  Leadbetter  [55.561  that  r(x>  was  asymptotically  unbiased 

if  h  0  and  nh  -*■  The  conditions  assumed  here  were  essentially  the  same 
n  n 

as  those  required  by  McNichols  and  Padgett  [34],  except  for  the  proportional 
hazards  (Koziol-Green)  model  assumption  which  gave  somewhat  different  expressions 
for  the  mean  and  variance.  The  asymptotic  variance  was  also  obtained,  and  Hajek*s 
projection  method  was  used  to  establish  asymptotic  normality  under  conditions 
on  K,  F°,  H,  and  h  .  Tanner  and  Wong  [51]  studied  a  class  of  estimators  of 

A 

the  same  general  form  as  r(x)  with  replaced  by  Kg,  where  _6  was  a 
posit Ive^valued  "smoothing  vector"  chosen  to  maximize  a  likelihood  function. 

Hence,  for  this  estimator  the  smoothing  parameters  were  chosen  based  on  the  ob¬ 
served  data. 

Tanner  [491  considered  a  modified  kernel-type  estimator  of  r°  in  the  form 

where  was  the  distance  from  x  to  the  kth  nearest  of  the  uncensored 

observations  among  . X^.  This  estimator  allowed  the  data  to  play  a  role 

in  determining  the  degree  of  smoothing  that  would  occur  in  the  estimate. 

Assuming  that  and  were  continuous  in  a  neighborhood  about 

x,k  ■  [n®l,  >5  <  a  <  1,  tdiere  [•]  was  the  greatest  integer  function,  that  K 

had  bounded  variation  and  compact  support  on  the  interval  [-1,1],  and  that 


r  was  continuous  at  x,  It  was  shown  that  vas  strongly  consistent. 

Blum  and  Susarla  [3]  considered  the  estimator  (in  the  notation  of  Equation 

(3.2) ) 

..  (fV)  (x) 

r  (x)  - - 5 -  ,  X  i  0, 

”  s„(*) 

where  S*(x)  •  (number  of  Z. 's  >  x)/o.  This  estimator  was  also  of  the  kernel 
n  j 

type,  and  limiting  results  similar  to  those  stated  for  the  density  estimator 

A 

(3.2)  were  obtained  for  r  . 

n 

Ramlau-Hansen  [44]  used  martingale  techniques  to  treat  the  general  multi¬ 
plicative  intensity  model.  His  results  are  very  general  and  include  the  kernel 
estimators  of  hazard  rate  functions  of  Foldes,  Rejto,  and  Vinter  [16]  and 
Yandell  [61].  The  martingale  techniques  yielded  local  asymptotic  properties  of 
many  of  the  hazard  rate  estimators  in  a  simpler  manner  than  classical  procedures. 

Finally,  in  a  recent  paper  Liu  and  Van  Ryzln  [26]  obtained  a  histogram 
estimator  of  the  hazard  rate  function  from  randomly  right-censored  data  based 
on  spaclngs  in  the  order  statistics.  They  showed  the  estimator  to  be  uniformly 
consistent  in  a  bounded  interval  and  asymptotically  normal  under  suitable 
conditions.  An  efficiency  comparison  of  their  estimator  with  the  kernel  estl- 
laator  of  hazard  rate  was  also  given.  Also,  Liu  and  Van  Ryzin  [2  7]  gave  the 
large  sample  theory  for  the  normalized  maximal  deviation  of  a  hazard  rate  esti¬ 
mator  under  random  censoring  which  was  based  on  a  histogram  estimate  of  the 
subsurvival  density  of  the  uncensored  observations. 

4.  Likelihood  Methods 

One  approach  to  estimating  a  density  function  nonparametrlcally  Is  that 
of  maximum  likelihood.  Nonparametric  maximum  likelihood  estimates  of  a 


probability  density  function  do  not  exist  In  general.  That  Is,  the  likelihood 

function  for  a  complete  sample  Is  unbounded  over  the  class  of  all  possible 

densities.  However,  by  suitably  restricting  the  class  of  densities,  a  nonpara- 

metrlc  maximum  likelihood  estimator  (MLE)  may  be  found  within  the  restricted 

class.  For  complete  samples,  the  maximum  likelihood  estimator  of  a  density  g 

was  given  by  Barlow,  Bartholomew,  Bremner  and  Brunk  [1]  if  g  was  assumed  to 

be  either  decreasing  (nonincreasing)  or  unlmodal  with  known  mode.  Wegman 

[57,58]  assumed  unlmodallty  with  unknown  mode  and  found  the  MLE  of  the  density 

and  studied  Its  properties  for  complete  samples. 

McNichols  and  Padgett  [33]  studied  maximum  likelihood  estimation  of 

decreasing  or  unlmodal  densities  based  on  arbitrarily  right-censored  data. 

The  censoring  variables  U. ,...,U  could  be  either  constants  or  continuous 

1  n 

random  variables.  They  first  assumed  that  f°  was  decreasing  (nonincreasing) 

on  [0,*)  and  let  be  the  set  of  distributions  with  decreasing  left- 

continuous  densities  on  [0,*°).  For  the  ordered  censored  observations 

(z^,£p,  1^1,..., n,  the  likelihood  function  was  written  as 

n  6!  1-6! 

L(f°)  -  n  [f°(z,)]  *  [S°(z.)]  ^  , 

i«l  ^ 

where  -  1-F°.  It  was  shown  that  a  maximum  likelihood  estimator  of 
must  be  a  step  function. 

The  estimator  was  found  by  maximizing  the  likelihood  function  L(f°)  over 

F^  subject  to  the  decreasing  density  constraint.  Equivalently,  the  constrained 

optimization  problem  to  be  solved  was 

n  1 

maximize  {6  log  y.  +  (1-6  )log[l-  J  y  (z  -  z  )]) 

y^,...,y^  1-1  ^  ^  j-1  J  ^ 


•object  to  (1)  y,  i  y.  2...i  y  i  0 

X  A  n 


where  Cq  =  0.  This  function  to  be  naxlaized  ves  shovn  to  be  concave  and  the 

*  * 

problem  was  shown  to  have  a  unique  solution*  say  T2.****'^n*  density 


of  the  form 


f*(x) 


0,  X  ^  0 


'j’  Vi  ^  ^  *j'  . 


0,  X  >  z. 


was  a  maximum  likelihood  estimator  of  i  »  where  value  less  than 

or  equal  to  y*,  and  z  .,  (>  z  )  were  chosen  so  that 


^  ■  X  ■  'j-i’  ■ 

j*l 


Similarly*  f°  was  estimated  by  maximum  likelihood  assuming  that  f^  was 
increasing  (nondecreasing)  on  [0*M]*  M  >  0  known.  Then*  if  M  denoted  the 
known  mode  of  the  unknown  unimodal  density*  the  two  naxinum  likelihood  estimators 
on  [0*M]  and  on  (M*«)  found  as  above  could  be  combined  to  estimate  the  uni- 
modal  density.  If  was  assumed  to  be  unimodal  with  unknown  mode  M*  then 

HcNichols  and  Padgett  [33]  applied  the  above  procedure  for  known  mode*  assuming 
z^  ^  <  M  <  Zj  for  each  J>l,...,n*  obtaining  n  solutions  for  f°.  These  n 
solutions  gave  n  corresponding  values  of  the  likelihood  function.  The  maximum 
likelihood  estimator  of  f^  was  then  taken  to  be  the  solution  with  the  largest 
of  the  n  likelihood  values,  analogous  to  Vegman's  [57*58]  procedure  for  complete 
samples. 


Another  approach  to  the  problea  of  nonparanetrlc  naxlnun  likelihood 
estimation  of  a  density  from  complete  samples  was  proposed  by  Good  and  Gaskins 
[20].  This  method  allowed  any  smooth  Integrable  function  on  the  interval  of 
Interest  (a«b}  (which  may  be  finite  or  infinite)  as  a  possible  estimator,  but 
added  a  "penalty  function"  to  the  likelihood.  The  penalty  function  penalized 
a  density  for  its  lack  of  smoothness,  so  that  a  very  "rough"  density  would  have 
a  smaller  likelihood  than  a  "smooth"  density,  and  hence,  would  not  be  admissible 
De  Mont  richer,  Tapia,  and  Thompson  [9]  proved  the  existence  and  uniqueness  of 
the  maximum  penalized  likelihood  estimator  (MPLE)  for  complete  samples. 

Lubecke  and  Padgett  [30]  assumed  that  the  sample  was  arbitrarily  right-censored, 
(Xi ,^1),  l«l, . . . ,n,  and  showed  the  existence  and  uniqueness  of  a  solution  to 
the  problem: 


maximize  L(g)  subject  to 

g(t)  2  0  for  all  t  c  n,  g(t)dt  •  1, 

and  g  €  H(ft), 


(«.l) 


n  6  1-6. 

where  L(g)  -  II  [g(x  )]  (1-G(x JJ  exp[-^(g)],  (2  is  a  finite  or  Jafinitt* 

i-1  ^  ^ 

interval,  H(n)  is  a  manifold,  and  G  is  the  distribution  function  for  density 

k 

g.  In  particular,  letting  u  ■  g  and  using  Good  and  Gaskins'  [20]  first 

penalty  function,  the  problem  (4.1)  becomes: 

^  n  X  >5(1-6  ) 

maximize  L(u)  ■  11  Iu(x.)]  *[!-/  ^  u^(t)dt] 

i-1 


X  expl-2a  (u'(t))^dtl. 


(4.2) 


where  x^  >  0,  1*1,. ..,n,  u^(t)dt  •  1,  and  u(t)  2  0,  t  >  0. 

Let  X  j  •  x^  and  6_^  ■  6^,  1*1,..., n,  and  define  u(x)  *  u(  |x  |)  for 


-  •  ^  • 


X  c  R\{0}  and  u(0)  ■  lln.  u(x}.  Then  define  the  following  problem: 

jrK) 

"  -  *1  f*l  -2  *5(1-6.) 

maximize  L(u)  •  II  [u(x.)]  [2-/  u  (t)dt] 

lll-l  ^ 


X  exp(>2a  (u'(t))^dt]. 


(4.3) 


where  u^(t)dt  -  2,  u  «  E  {g  c  (-*,•):  g(x)  -  g(-x)}, 

and  (-»,«>)  ig  the  Sobolev  space  of  real-valued  functions 
such  that  the  function  and  its  first  derivative  are  square 
Integrable. 

*  *  * 

If  u  solves  (4.3),  then  it  can  be  shown  that  u^(t)  >  u  (t),  t  2  0, 

and  u_^(t)  •  0,  t  <  0,  solves  (4.2).  Lubecke  and  Padgett  [30]  showed  that  a 

-* 

solution  to  (4.3)  was  a  function  u  which  solves  the  linear  integral  equation 


X  slnh  [(X/2a)^(t-T)l  u^(T)dT, 


(4.4) 


where  the  forcing  function  is  defined  by 

,  (  n  6.(201)"**  , 

C(t;  x.o.X)  =  y  {  I  -z -  Iexp(-(X/2o)’  jt-x  () 

j|i|-l 

+  exp(-(X/2a)**|t+x^|)] 

?  k  I 

“  I  T7"r/\  Iexp(-(X/2o)t)  +  exp((X/2a)\)])  , 

|l|-l  U2X'*1*  ) 


for  a  X  >  0.  The  Integral  equation  (4.4)  can  be  transformed  to  a  second-order 
differential  equation  whose  solution  u  can  be  numerically  obtained.  Then 
(u  )^  is  the  MPLE  of  the  density  f  based  on  the  first  penalty  function  of 


Good  and  Gaskins 


The  nonparametric  oaxiaum  likelihood  estinatlon  of  the  hazard  rate  function 
r*^  based  on  the  arbitrarily  right-censored  sample  l>l,2,...,n,  vas 

considered  by  Padgett  and  Wei  [41]  in  the  class  of  increasing  failure  rate  (IFR) 
distributions.  The  techniques  of  order  restricted  Inference  were  used  to  obtain 
the  estimator  following  an  argument  similar  to  that  of  Marshall  and  Proschan  [31] 
for  the  complete  sample  case.  A  closed  form  solution  to  the  likelihood  function 
of  r°  subject  to  the  IFR  condition  was  found  to  be  a  nondecreasing  step  function. 
Small  sample  properties  of  their  estimator  were  indicated  by  a  Monte  Carlo  study. 
Mykytyn  and  Santner  [37]  considered  the  same  problem  of  maximum  likelihood  esti¬ 
mation  of  r^  under  arbitrary  right  censorship  assuming  either  IFR,  decreasing 
failure  rate  (DFR),  or  U-shaped  failure  rate.  Their  estimator  was  essentially 
equivalent  to  Padgett  and  Wei's  estimator  and  was  shown  to  be  consistent  by  using 
a  total  time  on  test  transform.  This  estimator  was  maximum  likelihood  in  the 
Klefer-Volfowltz  sense. 

Friedman  [17]  also  considered  maximum  likelihood  estimation  from  survival 
data.  Let  n  survival  times  be  observed  over  a  time  period  divided  into  I(n) 
intervals  and  assume  that  the  hazard  rate  function  of  the  time  to  failure  of 
individual  j,rj(t),  is  constant  and  equal  to  r^^  >0  on  the  1«>  interval. 

The  maximum  likelihood  estimate  A  of  the  vector  A  ■  {log  r^^:  J*l,...,n; 
i-l,...,l(n)}  gave  a  simultaneous  estimate  of  the  hazard  rate  function. 

A 

Friedman  gave  conditions  for  the  existence  of  A  and  studied  the  asymptotic 
properties  of  linear  functionals  of  A  in  the  general  case  when  the  true  hazard 
rate  is  not  a  step  function.  This  piecewise  smooth  estimate  of  the  hazard  rate 
can  be  regarded  as  giving  piecewise  smooth  density  estimates. 
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Some  Other  Methods 


Nonparametrlc  density  estimators  based  on  Fourier  series  representations 

have  been  proposed  for  censored  data.  Kimura  [23]  considered  the  problem  of 

estimating  density  functions  and  cumulatives  by  using  estimated  Fourier  series. 

A  method  for  generating  a  useful  class  of  orthonormal  families  was  first 

developed  for  the  complete  sample  case  and  the  results  were  then  generalized 

to  the  case  of  censored  data.  Variance  expressions  for  the  quantity 

-J  ^(x)  dP  (x)  were  obtained,  where  ^  was  chosen  so  that  the  variance 
— n 

A 

existed  and  was  the  product-limit  estimator.  Finally,  Monte  Carlo  simula¬ 

tion  was  used  to  test  the  methods  developed. 

Tarter  [53]  obtained  a  new  maximum  likelihood  estimator  of  the  survival 
function  S°  by  using  Fourier  series  estimators  of  the  probability  densities 
of  the  uncensored  observations  and  censored  observations  separately.  That  is, 

.V 

the  density  estimates  were  f  and  f,  obtained  from  the  observed  uncensored 
X^'s  and  the  02  observed  censored  X^*s,  respectively,  where  -f  02  *  n. 

It  was  shown  that  as  n  •  the  new  likelihood  estimator  approached  the  product- 
limit  estimator  from  above.  It  should  be  noted  that  the  series-type  density 
estimators  f  and  f  used  here  were  obtained  by  the  usual  complete-sample  formulas. 

The  final  series-type  estimator  to  be  mentioned  here  is  the  general  esti¬ 
mator  of  the  density  in  the  k  competing  risks  model  of  Burke  and  Horvath  [5]. 

It  could  be  considered  as  a  Fourier-type  estimator  by  appropriate  choices  of 
the  form  of  the  defining  functions. 

Another  method  that  has  been  used  for  estimating  hazard  rate  and  density 
functions  is  that  of  Bayesian  nonparametrlc  estimation.  Since  the  work  of 
Ferguson  [12,13],  many  authors  have  been  concerned  with  the  Bayesian  nonparametrlc 


2 


estimation  of  a  distribution  function  or  related  functions  with  respect  to  the 
Dlrlchlet  process  or  other  random  probability  measures  as  prior  distributions. 
For  censored  data  Susarla  and  Van  Ryzin  [47,48]  considered  the  estimation  of  the 
survival  function  with  respect  to  Dirichlet  process  priors,  «dtile  Ferguson  and 
Phadls  [14]  used  neutral  to  the  right  processes  as  prior  distributions. 

Padgett  and  Vei  [42]  obtained  Bayesian  nonparametric  estimators  of  the 
survival  function,  density  function,  and  hazard  rate  function  of  the  lifetime 
distribution  using  pure  jump  processes  as  prior  distributions  on  the  hazard  rate 
function,  assuming  an  Increasing  hazard  rate.  Both  complete  and  right -censored 
samples  were  considered.  The  pure  jump  process  prior  was  appealing  because  it 
had  an  intuitive  physical  Interpretation  as  shocks  occurring  randomly  in  time 
that  caused  the  hazard  rate  to  Increase  a  constant  small  amount  at  each  shock, 
which  also  closely  approximated  the  (random)  increasing  failure  rate  by  a 
(random)  step  function. 

Dykstra  and  Laud  [10]  also  considered  a  prior  distribution  on  the  hazard 
rate  function  in  order  to  produce  smooth  nonparametric  Bayes  estimators.  Their 
prior  was  an  extended  gamma  process  and  the  posterior  distribution  was  found 
for  right-censored  data.  The  Bayes  estimators  of  the  survival  and  hazard  rate 
functions  with  respect  to  a  squared  error  loss  were  obtained  in  terms  of  a  one¬ 
dimensional  Integral. 

Lo  [28,29]  estimated  densities  and  hazard  rates,  as  well  as  other  general 
rate  functions,  from  a  Bayesian  nonparametric  approach  by  constructing  a  prior 
random  density  as  a  convolution  of  a  kernel  function  %rith  the  Dirichlet  random 
probability.  His  estimator  of  the  density  with  respect  to  squared  error  loss 
was  essentially  a  mixture  of  an  initial  or  prior  guess  at  the  density  and  a 
sample  probability  density  function.  Bis  technique  can  be  used  fpr  complete  or 
censored  samples. 
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Of  the  many  types  of  nonparametrlc  density  estimators  available,  probably 
the  most  often  used  in  practice  are  the  kernel-type  estimators.  They  are 
relatively  simple  to  calculate  and  can  produce  smooth,  pleasing  results.  In 
this  section  numerical  examples  will  be  given  for  the  kernel  estimator  (3.4) 
and  the  modified  estimator  (3.6)  with  the  nearest  neighbor-type  procedure  for 
selecting 

One  problem  in  using  kernel  density  estimators  is  that  of  how  to  choose 

the  "best"  value  of  the  bandwidth  h^  to  use  with  a  given  set  of  data.  This 

question  has  been  addressed  in  the  complete  sample  case  by  several  authors 

(see  Scott  and  Factor  [46],  for  example),  and  "data-based"  choices  of  h^  have 

been  proposed  using  maximum  likelihood,  mean  squared  error,  or  other  criteria. 

For  the  estimator  (3.4)  no  expressions  for  the  mean  squared  error  for  finite 

sample  sizes  exist  at  present,  except  for  those  very  complicated  ones  given  by 

McNichols  and  Padgett  [32]  under  the  Koziol-Green  model.  Hence,  selection  of 

h  to  minimize  mean  squared  error  does  not  seem  to  be  feasible.  However,  tfonte 
n 

Carlo  simulation  results  of  Padgett  and  McNichols  [40]  Indicate  that  at  each  x 

there  is  a  value  of  h  which  minimizes  the  estimated  mean  squared  error  of 

n 

A 

f  (x)  in  (3.4).  Similar  results  were  also  obtained  in  [40]  for  the  Blum- 
n 

Susarla  estimator  f  (x)  defined  by  (3.2).  These  simulation  results  Indicated 

n 

a  range  of  values  of  h^  which  gave  small  estimated  mean  squared  errors  of 

^  it 

f  (x)  and  f  (x)  at  fixed  x.  The  maximum  likelihood  criterion  for  selecting 
n  n 

h  for  a  given  censored  sample  is  feasible  for  f  but  does  not  seem  to  be 
n  ® 

* 

tractsble,  even  using  numerical  methods,  for  f^  due  to  the  complications  intrO' 
duced  by  the  term  H*(x)  in  the  likelihood  expression.  The  maximum  likelihood 


approach  will  be  used  In  the  following  example  for 

Following  a  similar  approach  to  expressions  (2.8)  and  (2.9)  of  Scott  and 
Factor  [46] ,  consider  choosing  h^  to  be  a  value  of  h  ^  0  which  maximizes 
the  likelihood 


«  -  6!  ^  ^  1-fi! 

L(h)  -  n  [f„(*i)l  ^  I/J  f„(u)dul  ^  . 
i“l  1 


(6.1) 


Obviously,  by  definition  of  f^,  the  maximum  of  (6.1)  Is  +»  at  h*0.  Hence, 
the  following  modified  likelihood  criterion  is  considered: 

n  -  K  ^ 

maximize  L,  (h)  *  II  [f  t/T  ^  »  (6.2) 

hsO  ^  k=l  \ 


where 


z,  -z . 
k  1 


^nk<^k>  '  -ir  >  • 

-h  2 

For  the  standard  normal  kernel  K(u)  ■  (2ir)  exp(-u  /2),  the  logarithm  of  (6.2) 


becomes 


log  L  (h)  -  -  (  ^  6')  log  h 

^  k-1 


+  I  6^  log[  I  s.(2it)"’*  exp(-(z.  -z  )2/2h^)] 
k-i  j-1  J  J 


+  I  (l-6’)log(  I  6,(l-*((z.  -z.)/h))], 
k-1  j-1  ^  ^ 


(6.3) 


where  ^  denotes  the  standard  normal  distribution  function.  An  approximate 
(local)  maximum  of  (6.3)  with  respect  to  h  can  be  easily  found  by  numerical 
methods  for  a  given  set  of  censored  observations,  and  this  estimated  h,  denoted 

A  ^ 

by  h  ,  can  be  used  in  (3.4)  to  calculate  f  (x). 
n  “ 


For  this  example  of  the  density  estimation  procedure  given  by  (6.3)  and 
(3.4),  the  life  test  data  for  n > 40  mechanical  switches  reported  by  Nair  [38] 
are  used.  Two  failure  modes,  A  and  B,  were  recorded  and  Nair  estimated  the 
survival  function  of  mode  A,  assuming  the  random  right-censorship  model.  Table 
1  shows  the  40  observations  with  corresponding  6^  values,  where  6^  ••  1  indi¬ 
cates  failure  mode  A  and  *  0  denotes  a  censored  value  (or  failure  mode  B) . 
Using  this  data,  the  function  log  L^(h)  had  a  maximum  in  the  interval  [0,1] 

^  A 

at  h,^  ~  0.18.  Hence,  f , _  was  computed  from  (3.4)  with  bandwidth  0.18. 

40  mO 

This  estimate  is  shown  in  Figure  1.  This  maximum  likelihood  approach  to  select¬ 
ing  h^  does  not  produce  the  smoothest  estimate,  but  is  one  criterion  that  can 
be  used. 

Shown  also  in  Figure  1  are  the  modified  kembl  estimates  calculated  from 
(3.6)  with  the  "y^-nearest  neighbor”  calculation  of  for  the  smoothing  para¬ 

meter  values  a  *  0.60  and  0.75.  The  estimate  was  also  calculated  for  a  -  0.35 

A 

but  was  very  close  to  the  fixed  bandwidth  estimate  f^^  with  h  *  0.18  and, 
hence,  is  not  shown.  The  modified  estimator  (3.6)  with  o  -  0.75  is  pleasingly 
smooth,  but  with  the  small  sample  and  only  17  uncensored  observations,  the 
value  of  a  ■  0.60  might  be  a  compromise  between  the  very  smooth  (o  ■  .75)  and 
somewhat  rough  (a  ~.55)  estimates. 
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